Collaborators: Tomo Tanaka, John Eykelenboom.
In our data we distinguish four states, marked by colour: blue, brown, pink, red. For the purpose of the model we merge blue and brown together into one state. Here is the result of the experiment from 38 cells.
The next figure shows the proportions of each colour as a function of time. The curves are smoothed with a running mean over the window of 5 time points.
The model consists of three states and a set of rules.
* A switch, \(t_{2, ref}\) was introduced to the model, selecting the reference for time \(t_2\). In the default position (\(t_{2, ref} = 1\)) the B\(\rightarrow\)P activation time is \(t_0-t_1 + \Delta t_2\). When the switch is set to \(t_{2, ref} = 0\), the activation time is \(t_0 + \Delta t_2\), that is, it can only occur after the nuclear envelope breakdown. The idea of the switch was to separate pink and red curves, as observed in some data sets.
I use a Markov chain approach. The next state is generated from the current state based on rules outlined above. The rates, \(k\), are converted into probabilities over a given time step \(\Delta t\) as \(Pr = 1 - e^{1 - k\Delta t}\). The transition time, \(t_1\), is generated before the simulation starts for a given cell.
The cell timeline is repeated for \(n_{\rm cell}\) times and then the colour proportions are found at each time point.
Here is an example of the model.
And here is an example of the the model where P\(\rightarrow\)R transition can occur only after \(t_0 = 0\).
There is a shiny app that allows tuning parameters in search for the best solution.
I attempted fitting model to our data. It is not an easy task, as the model is stochastic. After some testing I found that modified BFGS (a quasi-Newton method, also known as a variable metric algorithm, by Broyden, Fletcher, Goldfarb and Shanno, 1970) which allows box constraints (Byrd et al. 1995), gives reasonable results. Again, due to stochasticity of the model, this algorithm often finds false local minima. I run it several times to find the best minimum. It is a crude and time-consuming method, but gives results better than manual tuning.
Fitting is constrained to time points between -50 and 30 min. The minimized quantity is
\({\rm rms} = \sqrt{\sum_{c \in \{B,P,R\}} \sum_i (O_{c,i} - E_{c,i})^2}\)
that is, the square root of the sum of squared residuals over all time points and all colours.
First, we fit the model with \(\tau_1\), \(\tau_2\), \(k_1\) and \(k_2\) free. To improve the chances of finding the correct global minimum, I run the fit with 10,000 cells and 100 tries.
We suspect that \(t_0\) might not be exactly zero. Hence I fit the model with \(t_0 = 0, -5, -10, -15\) minutes.
I think the issue here is not \(t_0\) but the fact that the red curve growths is too fast and the model cannot do it.
Here are fit results for all conditions with \(t_0\) fixed at zero and four parameters free: \(\tau_1\), \(\tau_2\), \(k_1\) and \(k_2\).
Here are results from a modified model, where P\(\rightarrow\)R transition can happen only after \(t_0\). By default \(t_0 = 0\) and the results are shown in the left panels below. As you can see, the red curve in the model lags behind the red curve in the data. Hence, I fitted the data again, but fixing \(t_0 = -10\) min this time (right panels). This aligns the red curves a little better.
We can also see that the pink curve is more peaked, as opposed to the smooth rise and decay in the default model.
It is not easy to find confidence intervals in fit parameters when not only the model is non-analytic, but also stochastic. The only feasible approach it bootstrap, which is computationally expensive. Here I give it a try.
Running bootstraps on the cluster. This is quite slow, so I do it bit by bit.
Here is the distribution of bootstrap result. Each bootstrap gives a set of fit parameters. Figures below show the distribution of fit parameters across all bootstraps.
And here are fit parameters and their 95% confidence intervals:
Here I create cell-line plots for all data sets using four colours.
PDF files:
| link |
|---|
| scramble |
| NCAPD2 |
| NCAPD3 |
| SMC2 |
| WAPL24 |
| WAPL48 |
| MK1775 |
| MK1775_ICRF193 |
| RAD21 |
| TT103 |
| TT108 |
Here I calculate the density of brown boxes and the distribution of inter-arrival times between them. If brown points are distributed randomly with density \(\lambda\), the intervals between them should follow the exponential distribution with PDF
\(f(d) = \lambda e^{-\lambda d}\)
where \(d\) is the inter-arrival time between consecutive brown points. Because some of the cells have much shorter data tracks, they will bias the inter-arrival distribution towards shorter intervals. The longer distances are missing from these short tracks. To account for this I made a simple model, in which I take cell tracks from the actual data, fill them randomly with brown points and calculate inter-arrival distribution. This is done 100 times to smooth the distribution.
The example below shows the result for scramble. The black bars represent data, the orange points show the predicted theoretical distribution \(f(d) = \lambda e^{-\lambda d}\) and the open blue bars show the simulated random distribution. As we can see, the simulated distribution predicts more short intervals with respect to the theoretical distribution, as expected.
I use a window of [-300, -20] min.
However, this is not what we really want. Inter-arrival time between individual one-minute boxes assumes that each box is one event. But in reality, brown event can last less or more than one minute. If we see, e.g., four brown boxes next to each other, it is unlikely that this shows four one-minute events. Much more likely, this is one event that lasts four minutes. Hence, considering the distribution of times between one-minute boxes and comparing it to a Poisson distribution of random one-minute events does not make much sense.
Instead, we should consider the distribution of full brown events, short and long. Below, I show the distribution of duration of brown events.
NOTE: the blue open bars show the simulated data (see details below). There is a slight difference between real and simulated data. This is because for each simulation (which is repeated 100 times) gaps in real data are filled with random data. Hence, the simulated data shows the most likely real data, with imputation.
We can get more insight from the distribution of intervals between the brown events. Here I use the gaps between the events, not taking brown begin/end into account. For example, if “+” is brown and “-” is blue then this cell track
---+--++++---+-
shows three brown events of length 1, 4, and 1 and two “gap” events of lengths 2 and 3. The blue sequences at the beginning and at the end are ignored, because we don’t know how long they are, we only see part of them.
This figure shows the distribution of gaps between brows events. The simulated distribution (open bars) is calculated in the following way:
If brown events were random, both data and simulated distributions would be similar. The p-value in the title shows the result from a chi-square test comparing data and simulation. The number “d” in each title is the density of brown boxes (\(\lambda\)).
As we can see, there is an excess of short gaps in comparison to a random distribution. This means that brown events tend to cluster together. In particular, the 1-min gaps are more prominent, suggesting that one brown event might trigger another, immediatelly after it. On the other hand, this is our timing resolution, so, to some extent, it might be an experimental artifact. For example, if we have a long brown event and one box in the middle is misidentified as blue, than it creates an artificial gap of size 1. If this happens a few times, we get the observed pattern.
Below, are results for all conditions. Both event duration (left) and gap distribution (right).
Here I compare event duration and gap duration distribution for each pair of conditions. I use chi-square test and the null hypothesis is that the distribution does not depend on the sample.
| scramble | NCAPD2 | NCAPD3 | SMC2 | WAPL24 | WAPL48 | MK1775 | MK1775_ICRF193 | RAD21 | TT103 | TT108 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| scramble |
|
0.17 | 0.44 | 0.38 | 0.099 | 0.45 | 0.76 | 0.37 | 0.58 | 0.0029 | 0.37 |
| NCAPD2 | 0.17 |
|
0.63 | 0.58 | 0.27 | 0.046 | 0.77 | 0.34 | 0.11 | 0.46 | 0.76 |
| NCAPD3 | 0.44 | 0.63 |
|
0.49 | 0.55 | 0.3 | 0.9 | 0.36 | 0.35 | 0.34 | 0.54 |
| SMC2 | 0.38 | 0.58 | 0.49 |
|
0.7 | 0.14 | 0.95 | 0.75 | 0.62 | 0.13 | 0.67 |
| WAPL24 | 0.099 | 0.27 | 0.55 | 0.7 |
|
0.21 | 0.65 | 0.21 | 0.32 | 0.17 | 0.23 |
| WAPL48 | 0.45 | 0.046 | 0.3 | 0.14 | 0.21 |
|
0.42 | 0.23 | 0.69 | 0.015 | 0.093 |
| MK1775 | 0.76 | 0.77 | 0.9 | 0.95 | 0.65 | 0.42 |
|
0.6 | 0.75 | 0.12 | 0.91 |
| MK1775_ICRF193 | 0.37 | 0.34 | 0.36 | 0.75 | 0.21 | 0.23 | 0.6 |
|
0.42 | 0.018 | 0.8 |
| RAD21 | 0.58 | 0.11 | 0.35 | 0.62 | 0.32 | 0.69 | 0.75 | 0.42 |
|
0.01 | 0.38 |
| TT103 | 0.0029 | 0.46 | 0.34 | 0.13 | 0.17 | 0.015 | 0.12 | 0.018 | 0.01 |
|
0.14 |
| TT108 | 0.37 | 0.76 | 0.54 | 0.67 | 0.23 | 0.093 | 0.91 | 0.8 | 0.38 | 0.14 |
|
| scramble | NCAPD2 | NCAPD3 | SMC2 | WAPL24 | WAPL48 | MK1775 | MK1775_ICRF193 | RAD21 | TT103 | TT108 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| scramble |
|
0.27 | 0.19 | 0.59 | 0.62 | 0.57 | 0.6 | 0.72 | 0.75 | 0.039 | 0.24 |
| NCAPD2 | 0.27 |
|
0.53 | 0.8 | 0.76 | 0.88 | 0.91 | 0.81 | 0.85 | 0.15 | 0.29 |
| NCAPD3 | 0.19 | 0.53 |
|
0.51 | 0.19 | 0.96 | 0.41 | 0.34 | 0.68 | 0.15 | 0.55 |
| SMC2 | 0.59 | 0.8 | 0.51 |
|
0.62 | 0.93 | 0.99 | 0.77 | 0.85 | 0.05 | 0.46 |
| WAPL24 | 0.62 | 0.76 | 0.19 | 0.62 |
|
0.74 | 0.8 | 0.77 | 0.71 | 0.27 | 0.68 |
| WAPL48 | 0.57 | 0.88 | 0.96 | 0.93 | 0.74 |
|
0.95 | 0.83 | 0.45 | 0.77 | 0.78 |
| MK1775 | 0.6 | 0.91 | 0.41 | 0.99 | 0.8 | 0.95 |
|
0.93 | 0.65 | 0.23 | 0.92 |
| MK1775_ICRF193 | 0.72 | 0.81 | 0.34 | 0.77 | 0.77 | 0.83 | 0.93 |
|
0.57 | 0.67 | 0.84 |
| RAD21 | 0.75 | 0.85 | 0.68 | 0.85 | 0.71 | 0.45 | 0.65 | 0.57 |
|
0.061 | 0.46 |
| TT103 | 0.039 | 0.15 | 0.15 | 0.05 | 0.27 | 0.77 | 0.23 | 0.67 | 0.061 |
|
0.47 |
| TT108 | 0.24 | 0.29 | 0.55 | 0.46 | 0.68 | 0.78 | 0.92 | 0.84 | 0.46 | 0.47 |
|
These tables show that there is very little discernible difference between conditions. Only TT103 differs from some other conditions in gap distribution. Otherwise, we have no evidence to reject the null hypothesis (but we cannot accept it either!).
| scramble | NCAPD2 | NCAPD3 | SMC2 | WAPL24 | WAPL48 | MK1775 | MK1775_ICRF193 | RAD21 | TT103 | TT108 | S_phase | G2_phase | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| S_phase | 0.12 | 8.4e-05 | 0.008 | 0.0038 | 0.0096 | 0.24 | 0.073 | 0.065 | 0.53 | 1.4e-06 | 0.0083 |
|
0.28 |
| G2_phase | 0.0039 | 3.5e-06 | 2.9e-05 | 7.7e-06 | 5e-05 | 0.0027 | 0.0049 | 0.0083 | 0.15 | 1.2e-10 | 0.00067 | 0.32 |
|
| scramble | NCAPD2 | NCAPD3 | SMC2 | WAPL24 | WAPL48 | MK1775 | MK1775_ICRF193 | RAD21 | TT103 | TT108 | S_phase | G2_phase | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| S_phase | 2.9e-05 | 0.099 | 0.0011 | 0.0063 | 0.16 | 0.13 | 0.38 | 0.33 | 3.8e-05 | 0.23 | 0.13 |
|
0.26 |
| G2_phase | 0.068 | 0.8 | 0.56 | 0.031 | 0.19 | 0.74 | 0.44 | 0.68 | 0.11 | 0.71 | 0.087 | 0.49 |
|